Merged
Conversation
…lvm#181252) Replace manual region dissolution code in simplifyBranchConditionForVFAndUF with using general removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates a (BranchOnCond true) or updates BranchOnTwoConds. The loop then gets automatically removed by running removeBranchOnConst. This removes a bunch of special logic to handle header phi replacements and CFG updates. With the new code, there's no restriction on what kind of header phi recipes the loop contains. Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is technically unrelated, but I could not find an independent test that would be impacted. The code to deal with epilogue resume values now needs updating, because we may simplify a reduction directly to the start value. PR: llvm#181252
Currently for thin-lto, the imported static global values (functions, variables, etc) will be promoted/renamed from e.g., foo() to foo.llvm.<hash>(). Such a renaming caused difficulties in live patching since function name is changed ([1]). It is possible that some global value names have to be promoted to avoid name collision and linker failure. But in practice, majority of name promotions can be avoided. In [2], the suggestion is that thin-lto pre-link decides whether a particular global value needs name promotion or not. If yes, later on in thinBackend() the name will be promoted. I compiled a particular linux kernel version (latest bpf-next tree) and found 1216 global values with suffix .llvm.<hash>. With this patch, the number of promoted functions is 2, 98% reduction from the original kernel build. If some native objects are not participating with LTO, name promotions have to be done to avoid potential linker issues. So the current implementation cannot be on by default. But in certain cases, e.g., linux kernel build, people can enable lld flag --lto-whole-program-visibility to reduce the number of functions like foo.llvm.<hash>(). For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a few other rare cases, reducing the number of renaming due to promotion, is not implemented as lld flag '-lto-whole-program-visibility' is not supported in ThinLTOCodeGenerator.cpp for now. In summary, this pull request only supports llvm-lto2 style workflow. [1] https://lpc.events/event/19/contributions/2212 [2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400
…ions" (llvm#183774) Reverts llvm#183567 UBSan failure.
Update the test to more cleanly handle making a 'blocking' call using a custom command instead of python `time.sleep`, which we cannot easily interrupt. This should improve the overall performance of the tests, locally they took around 30s and now finish in around 6s.
…cross incremental scans (llvm#183328) Add a test that verifies symlink aliases to a module map directory produce the same PCM across incremental scans.
When there's a dependency cycle between modules, the dependency scanner may encounter a deadlock. This was caused by not respecting the lock timeout. But even with the timeout implemented, leaving `unsafeMaybeUnlock()` unimplemented means trying to take a lock after a timeout would still fail and prevent making progress. This PR implements this API in a way to avoid UB on `std::mutex` (when it's unlocked by someone else than the owner). Lastly, this PR makes sure that `unsafeUnlock()` ends the wait of existing threads, so that they don't need to hit the full timeout amount. This PR also implements `-fimplicit-modules-lock-timeout=<seconds>` that allows tweaking the default 90-second lock timeout, and adds `#pragma clang __debug sleep` that makes it easier to achieve desired execution ordering. rdar://170738600
…part 22) (llvm#183681) Tests converted from test/Lower: intentout-deallocate.f90 Tests converted from test/Lower/Intrinsics: abs.f90, achar.f90, acospi.f90, adjustl.f90
Part 1 of changes needed for USM alloc/dealloc impl. This is part of the SYCL support upstreaming effort. The relevant RFCs can be found here: https://discourse.llvm.org/t/rfc-add-full-support-for-the-sycl-programming-model/74080 https://discourse.llvm.org/t/rfc-sycl-runtime-upstreaming/74479 --------- Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
Enable Flang to match Clang behavior for command-line recording in DWARF producer strings when using -grecord-command-line. Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Just like other bitcode libs such as ockl.bc ocml.bc, link asanrtl.bc with '-mlink-builtin-bitcode' in the driver when GPU ASan is enabled.
…llvm#183781) Two related crashes were fixed in vector.mask handling: 1. MaskOp::fold() crashes with a null pointer dereference when the mask is all-true and the mask body has no maskable operation (only a vector.yield). getMaskableOp() returns nullptr in this case, and the fold was calling nullptr->dropAllUses(). Fixed by returning failure() when there is no maskable op, deferring to the canonicalizer. 2. CanonializeEmptyMaskOp creates an invalid arith.select when the mask type is a vector (e.g., vector<1xi1>) but the result type is a scalar (e.g., i32). arith.select with a vector condition requires the value types to be vectors of the same shape. Fixed by bailing out when any result type doesn't match the mask shape. Regression tests are added for both cases. Fixes llvm#177833
…83199) Using physical register 0, aka NoRegister, also just looked suspicious.
…vm#178587)" (llvm#183782) There is a conflict with existing code. See llvm#178587 Revert and resolve the conflict and then will submit later.
This allows us to support more lifetimes, and also gets rid of the quadratic call to isPotentiallyReachable. Reviewers: pcc, usama54321 Reviewed By: pcc Pull Request: llvm#182425
Instead of excluding the whole package, push any existing parse_headers failures to individual targets. In some cases we can avoid suppressing a target by adding a few missing deps.
… support (llvm#183442) This is the second of three patches aimed to support indirect symbol handling for the SystemZ backend. An external name is added for both MC sections and symbols and makes the relevant printers and writers utilize the external name when present. Furthermore, the ALIAS HLASM instruction is emitted after every XATTR instruction. Depends on llvm#183441.
…4171) When hoisting loop invariant instructions, we can preserve profile metadata because it depends solely on the condition (which is loop invariant) rather than where we are in the control flow graph.
…x) (llvm#183363) add a pre-commit test case for Inefficient asm of std::bit_floor(x) for powerpc.
) Summary: This enables primarily `stop.cpp` and `descriptor.cpp`. Requires a little bit of wrangling to get it to compile. Unlike the CUDA build, this build uses an in-tree libc++ configured for the GPU. This is configured without thread support, environment, or filesystem, and it is not POSIX at all. So, no mutexes, pthreads, or get/setenv. I tested stop, but i don't know if it's actually legal to exit from OpenMP offloading.
…m#182512) LLVM converts sqrt libcall to intrinsic call if the argument is within the range(greater than or equal to 0.0). In this case the compiler is not able to deduce the non-negativity on its own. Extended ValueTracking to understand such loops. Have created new ABI's for matching Intrinsics with three operands (those existed only for 2 operands) `matchSimpleTernaryIntrinsicRecurrence` and `matchThreeInputRecurrence`. Fixes llvm#174813
…lvm#181030) This implements the TOKENIZE intrinsic per the Fortran 2023 Standard. TOKENIZE is a more complicated addition to the flang intrinsics, as it is the first subroutine that has multiple unique footprints. Intrinsic functions have already addressed this challenge, however subroutines and functions are processed slightly differently and the function code was not a good 1:1 solution for the subroutines. To solve this the function code was used as an example to create error buffering within the intrinsics Process and select the most appropriate error message for a given subroutine footprint. A simple FIR compile test was added to show the proper compilation of each case. A thorough negative path test has also been added, ensuring that all possible errors are reported as expected. Testing prior to commit: = check-flang ========================================== ``` Testing Time: 139.51s Total Discovered Tests: 4153 Unsupported : 77 (1.85%) Passed : 4065 (97.88%) Expectedly Failed: 11 (0.26%) FLANG Container Test completed 2 minutes (160 s). Total Time: 2 minutes (160 s) Completed : Wed Feb 11 04:05:50 PM CST 2026 ``` = check-flang-rt ========================================== ``` Testing Time: 1.55s Total Discovered Tests: 258 Passed: 258 (100.00%) FLANG Container Test completed 0 minutes (55 s). Total Time: 0 minutes (56 s) Completed : Wed Feb 11 04:08:32 PM CST 2026 ``` = llvm-test-suite ========================================== ``` Testing Time: 1886.64s Total Discovered Tests: 6926 Passed: 6926 (100.00%) CCE SLES Container debug compile completed 31 minutes (1895 s). CCE SLES Container debug install completed in 0 minutes (0 s). Total Time: 31 minutes (1895 s) Completed : Wed Feb 11 05:46:52 PM CST 2026 ``` Additionally, (FYI) an executable test has been written and will be added to the llvm-test-suite under a separate PR. --------- Co-authored-by: Kevin Wyatt <kwyatt@hpe.com>
…#183176) Adjusting `VariableReferenceStorage` to only need to track permanent vs temporary storage by making `VariableStore` the common base class. Moved the subclasses of `VariableStore` into the Variables.cpp file, since they're no long referenced externally. Expanding on the tests by adding an updated core dump with variables in the argument scope we can use to validate variable storage.
…183405) This commit updates the LLVM::decomposeValue and LLVM::composeValue methods to handle aggregate types - LLVM arrays and structs, and to have different behaviors on dealing with types like pointers that can't be bitcast to fixed-size integers. This allows the "any type" on gpu.subgroup_broadcast to be more comprehensive - you can broadcast a memref to a subgroup by decomposing it, for example. (This branched off of getting an LLM to implement ValueuboundsOpInterface on subgroup_broadcast, having it add handling for the dimensions of shaped types, and realizing that there's no fundamental reason you can't broadcast a memref or the like) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…low (llvm#181755) Rather than mapping out full "reachability" between blocks in a region to find loops and using `LoopBlocks` to find the bodies of said loops, use SCCs (strongly-connected components) to provide this information. This brings in LLVM's generic `SCCIterator` (which uses Tarjan's algorithm) as the implementation for sorting the basic blocks of the CFG into their SCCs. This PR greatly reduces the compile-time footprint of the pass, making memory use and time taken negliable where it might have previously caused stalls and OOM before (e.g. llvm#47793, usagi-coffee/tree-sitter-abl#114) ------ Supersedes llvm#179722 Fixes llvm#47793 Fixes llvm#165041 (probably) Thanks to @jkbz64 for the initial investigations (w/ AI; see llvm#179722) into why this pass was slow and memory consuming and showing that SCCs were the key. Also thanks to the Cheerp compiler project for bringing `SCCIterator` to light in this context ([blog post](https://cheerp.io/blog/control-flow#fix-the-irreducible-control-flow), [implementation](https://github.com/leaningtech/cheerp-compiler/blob/master/llvm/lib/CheerpUtils/FixIrreducibleControlFlow.cpp)).
Fix linking of 'ockl.bc' for OpenMP by switching from `-mlink-bitcode-file` to `-mlink-builtin-bitcode`
…lvm#182640) This patch makes it so that renumbering indices when inserting instructions into the SlotIndexes analysis renumbers the entire list if the list is otherwise densely packed. This fixes a case we saw on AArch64 with a lot of spills where every single spill instruction insertion required a renumbering of most of the instructions in a large function, making the operation approximately quadratic. This is not NFC as heuristics depend on the SlotIndex numbers, although this should mostly be a wash as LRs should be extended ~equally.
This PR adds `JSONFormat` support for reading and writing `TUSummaryEncoding`. The implementation exploits similarities in the structures of `TUSummary` and `TUSummaryEncoding` by reusing existing `JSONFormat` support for `TUSummary`. Duplication of tests has been avoided by parameterizing the test fixture that runs all relevant read/write tests against `TUSummary`, for `TUSummaryEncoding`. This ensures that the two serialization paths remain in lockstep.
After header search has found a header it looks for module maps that cover that header. This patch uses the parsed representation of module maps to do this search instead of relying on FileEntryRef lookups after stating headers in module maps. This behavior is currently gated behind the `-fmodules-lazy-load-module-maps` `-cc1` flag.
…perand is a block argument of its successor (llvm#183797) When `simplifyBrToBlockWithSinglePred` merges a block into its sole predecessor, it calls `inlineBlockBefore` which replaces each block argument with the corresponding value passed by the branch. If one of those values is itself a block argument of the successor block, the call `replaceAllUsesWith(arg, arg)` is a no-op. Any uses of that argument outside the block (e.g. in a downstream block) are therefore not replaced, and when the successor block is erased the argument is destroyed while those uses are still live, triggering the assertion `use_empty() && "Cannot destroy a value that still has uses\!"` in `IRObjectWithUseList::~IRObjectWithUseList`. Guard against this by returning early when any branch operand is a block argument owned by the destination block. Fixes llvm#126213
…m#181177) Checks that isReversibleBranch() returns false - when the immediate value is 63 and needs +1 adjustment - when the immediate value is 0 and needs -1 adjustment Checks that reverseBranchCondition() adjusts - the opcode - the immediate operand if necessary (+/-1) - the register operands if necessary (swap)
This variable ends up being unused in builds without assertions. Mark it [[maybe_unused]] per the coding standards.
Reviewers: Pull Request: llvm#183807
Reviewers: Pull Request: llvm#183808
…and UF. (llvm#181252)" This reverts commit 9c53215. Appears to cause crashes with ordered reductions, revert while I investigate
…m#183825) Currently, as pointed out in the reviews for llvm#183405, decomposeValues and composeValues should be able to emit zexts and truncations for cases like i48 and vector<3xi16> becoming i32s but currently that's an assert. This commit fixes that limitation. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Account for masked VPInstruction when verifying the operands in the constructor. Fixes a crash when trying to unroll VPlans for predicated early exits.
The `exact` flag with the following semantics > If the `exact` attribute is present, it is assumed that the index type width > is such that the conversion does not lose information. When this assumption > is violated, the result is poison. can be added to index_cast and index_castui operations. This unlocks the following lowerings: * index_cast (signed) exact -> trunc nsw * index_castui (unsigned) exact -> trunc nuw * index_castui nneg exact -> trunc nuw nsw Changes: * Adds ArithExactFlagInterface. * Updates Arith_IntBinaryOpWithExactFlag to use ArithExactFlagInterface * Update IndexCastOp and IndexCastUIOp to declare `ArithExactFlagInterface` * Update canonicalization patterns * Update roundtrip, lowering, and canonicalization tests.
Updates formatter_bytecode.py to support compilation and disassembly for synthetic formatters, in other words support for multiple functions (signatures). This includes a number of other changes: * String parsing and encoding have bugs fixed * CLI args are updated, primarily to support an output file * Added uleb encoding/decoding support This work is a prelude the ongoing work of a Python to formatter bytecode compiler. The python compiler to emit assembly, and this module (formatter_bytecode) will compile it into binary bytecode.
Fixing test failures on my local desktop with incremental building.
Collaborator
dpalermo
approved these changes
Feb 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.